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Course assessment using multi-stage pre/post testing and the 
components of normalized change 

David R. Dellwo^ 

Abstract: A multi-stage pre/post testing scheme is developed to gauge course 
effectiveness using gain and loss components of normalized change. The 
components, unlike normalized change itself can be used to distinguish courses 
that promote acquisition as well as retention of information from courses that 
promote acquisition at the expense of retention or retention at the expense of 
acquisition. The technique is employed to study the effectiveness of a course in 
differential calculus taught using the studio method, a form of interactive 
engagement. 
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I. Introduction. 

Assessment of learning is a recurrent and sometimes controversial theme in higher education. 
The literature is replete with conflicting advice on how best to conduct an assessment, see Hake 
(2004, 2006) and Suskie (2004a, 2004b). However, pre- and post-testing evaluation is often 
cited as a commonsense approach. Perhaps no one has expressed this point of view more 
graphically than Bond (2005), who wrote in a Carnegie Perspective: 

If one wished to know what knowledge or skill Johnny has acquired over the course of a 
semester, it would seem a straightforward matter to assess what Johnny knew at the 
beginning of the semester and reassess him with the same or equivalent instrument at the 
end of the semester. 

Theoretical justification for the technique was provided by Willet (1989a, 1989b, 1994, 
1997) and Rogosa (1995). In particular, they demonstrated that additional rounds of pre- and 
post-testing dramatically improve the method’s reliability. 

The multi-stage assessment scheme employed here partitions the course into several 
instructional periods. Each period is bracketed by pre-instruction and post-instruction tests, with 
the post-test for one period serving as the pre-test for the next period. This arrangement creates 
an opportunity to study the marginal (snapshot) effectiveness of individual instructional periods 
or alternatively to combine individual periods and study the cumulative (longitudinal) 
effectiveness of several combined instructional periods. 

The two analyses provide information on different aspects of course effectiveness. A 
cumulative analysis is used to determine whether repeated exposure to course material over 
multi-periods of instruction increases the likelihood of students acquiring and retaining baseline 
knowledge. A marginal analysis is used to determine whether course design is flexible enough to 
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continually rebalance acquisition and retention efforts as student performance changes from one 
instructional period to the next. 

The method used to quantify changes in performance is a definitive feature of any 
pre/post testing design. The following index is frequently used to measure the change in group 
performance from a pre-instruction to a post-instruction test. 


{ average gradeon the 1 f average gradeon the 1 

post - instruction test \ p re - instruction test 

* - ^ 

\pre- instruction test 

The ratio in (1), often referred to as normalized change, expresses the difference between 
average test scores as a fraction of the maximum possible difference between these scores. 

Hovland et al. (1949) used (1) to quantify the effectiveness of instructional films. Hake 
(1998) used (1) to gauge the relative effectiveness of various instructional techniques employed 
in introductory physics courses. Cummings et al. (1999) used (1) to evaluate innovations in 
studio physics. Meltzer (2002) used (1) to explore the relationship between mathematics 
preparation and concept learning in physics. These important studies relied on the intuitive 
notion that when comparing two courses: 

The course with the larger value of normalized change (g) is the more effective course. (2) 

Unfortunately, as demonstrated here, this classic assessment rule can lead to counterintuitive 
conclusions. 

This paper employs an alternate assessment rule obtained by decomposing normalized 
change (1) into component measures: 

g = G-yL (3) 


Here G is a normalized gain measuring the likelihood that a mistake on the group’s pre- 
instruction test is corrected on the post-instruction test. Similarly, L is a normali z ed loss 
measuring likelihood that a correct response on the group’s pre-instruction test is rendered 
incorrect on the post-instruction test. The non-negative parameter y is a renormalization factor 
dependent on the population’s pre-instruction performance. Consequently, (3) expresses 
normali z ed change (1) as the difference between two non-negative indices, normalized gain and 
renormalized loss. The decomposition (3) gives rise to an alternative assessment rule that avoids 
the counterintuitive conclusions associated with (2), and reads in part: 

The course with the larger value of normalized gain (G) and smaller 

value of renormalized loss (yL) is the more effective course. (4) 

The derivation of (3) is discussed in the next section. Section III discusses assessment 
standards and value-added measurement of effectiveness expressed in terms of the components 
of normalized change. Multi-stage assessment is discussed in section IV. The application is 
presented in section V. The concluding remarks in section VI discuss the implications of (3) for 
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past and future research. 

II. Normalized Change and Its Components 

Normalized change (1) for a group of N students taking a diagnostic test with M questions can be 
expressed in the following form: 


where 


e ,-e 

post pre 

1 - 


e 


pre 


9 


post 


{Number of questions students answer 
I correctly on the pre- instruction test 
NM 

{Number of questions students answer 
ycorrectly on the post - instruction test 
NM 


The derivation of (3) is based on the following observation. 


(5.a) 


(5.b) 


(5.C) 


Number of questions 
students answer correctly 
on the post - instruction test 


Number of questions 
students answer correctly 
on the pre - instruction test 


Number of questions students answer 
correctly on the post - instruction test 
and incorrectly on the pre - instruction test 


Number of questions students answer 
incorrectly on the post - instruction test 
and correctly on the pre - instruction test 


This observation together with definitions (5.b) and (5.c) imply 

= ( 6 ) 

where 

{Number of questions students answer correctly on the post- 
I instruction test and incorrectly on the pre - instruction test 
{Number of questions students answer 
^incorrectly on the pre - instruction test 
{Number of questions students answer incorrectly on the post - 
^ ^instruction test and correctly on the pre - instruction test 
{Number of questions students answer 
I correctly on the pre - instruction test 

The numerator in (7. a) is the number of questions on which students demonstrate a gain 
in knowledge and the denominator is the maximum possible gain. Consequently, the ratio G is a 


(7.b) 


(V.a) 
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normali z ed gain measuring the conditional probability (Ross, 2004) that a mistake on the group’s 
pre-instruction test is corrected on the post-instruction test. 

Similarly, the numerator in (7.b) is the number of questions on which students 
demonstrate a loss in knowledge and the denominator is the maximum possible loss. 
Consequently, the ratio L is a normalized loss measuring the conditional probability that a 
correct response on the group’s pre-instruction test is rendered incorrect on the post- instruction 
test. 

In summary, equation (6) expresses change in test score as a difference between the 
fraction of questions on which students demonstrate a gain in knowledge and the fraction on 
which they demonstrate a loss of knowledge. Finally, to obtain (3) define 


y = 


0 


pre 


1-e 


pre 


(7.C) 


and divide (6) by (l-0pre)- The scaling factor (7.c) is a non-negative parameter whose value is 
larger than 1 if 0pre > 14, equal to 1 if 0pre = 14, and smaller than 1 if 0pre < 14. The scale y is 
referred to as the group’s aspect ratio and specifies the odds that the group gives a correct answer 
on the pre-instruction test. 

III. Value-Added Measurement of Course Effectiveness. 


The following criteria are used in this study to assess the relative effectiveness of two courses (A 
and B). 


i. A is more effective than B if: 


Ga > Gg and < yghg 
or 


ii. A and B are equally effective if: 

iii. A and B are not comparable if: 


[Ga ^ Gg and YaL^ < YgLg 
and — y^Lg 
Ga > Gg and YaL^ > YbLb 
or 

Ga < Gg and YaL^ < YbLb 


(8.a) 

(8.b) 

(8.C) 


Notice, (8. a) restates (4) in algebraic form and defines a consistent ordering of courses in 
the sense that if A is more effective than B and B is more effective than C, then A is more 
effective than C. Also, (8.c) offers an assessment option not offered by (2): namely, some 
courses are not comparable. 

If A is a more effective course than B in the sense of (8. a), then Ga - Gb is a value-added 
measure of improved effectiveness due to larger gains, see (Suskie, 2004a). Also, YbLb - YaLaIs 
a value-added measure of improved effectiveness due to smaller renormalized losses experienced 
by students in the more effective course. Consequently, 

gA-gB= {Ga - Gg )+ (ygLg ~ y ^ ) (9) 

is a value-added measure of the total improvement in effectiveness when (8. a) or equivalently (4) 
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applies and one course can claim the larger gains as well as the smaller renormalized losses. 

On the other hand, (9) is not a measure of total improvement in effectiveness when (8.c) 
applies and neither course can claim both larger gains and smaller renormalized losses. In this 
case, one of Ga - Gb and Yb Lb - Ja La is positive while the other is negative; so (9) is the 
difference between two value-added measures: 


gA-gB = {Ga - Gb )+ (^bLe - Ja^a )= 


{G,-G,y{j,L,-y,L,) if {G,-G,)<0 
[Ga - Gb )- {Ja^a - Jb^b ) if {7 bLb ~ Ya^a )< 0 


( 10 ) 


That is, gA - gB is a difference between added effectiveness due to larger gains in one course and 
added effectiveness due to smaller renorm a li z ed losses in the other course. 

Finally, in view of (10), the classic assessment rule (2) declares A more effective than B 
when either of the following applies. 

• The added effectiveness due to smaller renorm a li z ed losses in A offsets the added 
effectiveness due to larger gains in B. 

• The added effectiveness due to larger gains in A offsets the added effectiveness due to 
smaller renormalized losses in B. 


Of course, neither of these alternatives can form the basis for a pedagogically sound strategy to 
improve learning. 

IV. Multi-Stage Assessment. 

Most pre/post assessment regimes employ a single instructional period bracketed by identical or 
nearly identical pre- and post-instruction tests. See (Hake, 1998), (Cummings et ah, 1999), 
(Meltzer, 2002), (Libarkin et ah, 2005), and (McConnell et ah, 2006). Unfortunately, these 
single-stage methods, relying on two tests, cannot gather enough data to detect inevitable 
fluctuations in learning that result from imperfect acquisition and retention of course material. 
For example, a round of pre/post testing cannot detect a difference in performance between a 
student who never learns a key skill and a student who learns and then forgets that skill during 
the term. Similarly, a round of testing cannot distinguish between a student who retains pre- 
instruction knowledge throughout the term and a student who forgets and then relearns that 
knowledge during the term. 

Multi-stage regimes track fluctuations in learning and refine the assessment process by 
combining several single-stage regimens. For example, the two- stage scheme diagramed in 
Figure 1 can detect a one-time loss and reacquisition of course material as well as a one-time 
acquisition and subsequent loss of material. It is important to note that the inter-period diagnostic 
test (Ti) serves as a post-instruction test for the first stage as well as a pre-instruction test for the 
second stage. 
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1ST instructional period 


2ND instructional period 



Tq; 1ST period pre- Tp inter-period test is both a pre- T 2 ; 2ND period post- 
instruction test and post-instruction test instruction test 

Figure 1. The first stage of a two-stage assessment scheme is bracketed by pre- and post-instruction tests To 
and Ti. The second stage is bracketed by Ti and T 2 . The diagnostic tests are identical or nearly identical 
instruments designed to assess learning of key skills and concepts. 


A. Marginal Analysis of Multi-Stage Schemes. 

A marginal analysis uses the components of normali z ed change (3) to tabulate changes in 
performance relative to pre-instruction levels at each stage of a multi-stage scheme. This 
technique can be used to study variations in effectiveness from one instructional period to the 
next for a particular course. Or alternatively, the approach can be used to compare effectiveness 
of two courses during a particular instructional period. 

Notice that for a marginal analysis the standard by which effectiveness is determined 
changes from period to period. For example, a marginal analysis of the two-stage scheme shown 
in Figure 1 might use gains and losses from Tq to Ti as well as from Ti to T 2 to study variations 
in effectiveness for a single course. In this situation, improving effectiveness from the first to the 
second instructional period means the course was more effective in boosting learning relative to 
performance on Ti than it was relative to performance on To. 

As a second example, the marginal analysis of a two-stage scheme might be used to 
compare the effectiveness of two courses in promoting learning relative to To as well as to Ti. In 
this situation, it may happen that one of the two courses is more effective in promoting learning 
relative to To while the other is more effective in promoting learning relative to Ti. 

B. Cumulative Analysis of Multi-Stage Schemes. 

A cumulative analysis tabulates changes in performance over several successive stages of a 
multi-stage scheme by measuring gains and losses from the initial pre-instruction test to each of 
the subsequent post-instruction tests. In contrast to the marginal analysis, a cumulative analysis 
uses performance on To as a fixed standard from which to gauge change over successive periods 
from To to Ti, from To to T 2 , from To to T 3 , etc. 

This technique can be used to compare a particular course’s effectiveness during the 
single period from To to Ti with its effectiveness during the two periods from To to T 2 in helping 
students rectify weaknesses revealed by their performance on the initial diagnostic test, Tq. 
Alternatively, the approach can be used to study the relative effectiveness of two courses in 
promoting learning over the first two, three, or more instructional periods following the initial 
diagnostic test. 

V. Application. 

The illustration presented here uses multi-stage pre/post testing to study the effectiveness of a 
course in differential calculus taught by the author to 125 plebes at the United States Merchant 
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Marine Academy over a seven-year period from the fall of 1999 to the fall of 2005. The course 
was taught using the studio method, a form of interactive engagement; see (Ecker, 1996a, 
1996b), (Hake, 1998) and (Sokoloff et ah, 1997). Methodology and other details are discussed 
first, in the next section, before presenting results. 

A. Study Procedures. 

This section discusses details of the teaching methods, the pre-instruction test, the pre- and post- 
instruction testing, as well as the student participants. 

Teaching Method. Studio sections of the differential calculus course were taught using a 
modified form of the integrated lecture-laboratory-recitation design envisioned by Ecker (1996a, 
1996b). In the classic studio setting, small groups of students work on in-class activities while 
receiving constant guidance and help from the instructor. There is essentially no lecture, and 
although there is homework, there is little use of out-of-class projects requiring written and/or 
oral reports. 

The modified studio format used here incorporated instructor demonstrations of 
interactive Maple applications as well as out-of-class group projects. Instructor demonstrations 
exploited the computer algebra system’s facility to perform “what if’ scenarios in real time, 
giving students the opportunity to rapidly and efficiently test preconceptions and correct 
misconceptions. Often the demonstrations were used in conjunction with classic studio activities 
from the text Studio Calculus (Ecker, 1996b). On the premise that teaching is the best way to 
learn, the out-of-class group projects required studio students to construct interactive multimedia 
learning aids for use by other students. 

Pre-Instruction Test. The pre-instruction (diagnostic) instrument included twenty-four 
multiple-choice questions concerning core concepts and skills associated with differential 
calculus. Specific topics include functions, limits, continuity, differentiation, as well as 
applications; see (Dellwo, 2000) for details. The diagnostic questions were typical of practice 
problems used to prepare for standardized exams. However, the questions were not vetted to the 
same degree as those developed for the Eorce Concept Inventory (Hestenes et ah, 1992) or the 
more recent Calculus Concept Inventory (Epstein, 2007). 

Pre- and Post-Instruction Testing. Each year, the pre-instruction test was administered on 
the first day of class. Post- instruction testing employed a regimen of quizzes intended to give 
each midshipman two post-instruction opportunities to answer the twenty-four diagnostic 
questions. Typically: 

• The first post-instruction quiz was composed of diagnostic questions on topics covered 
in class during the first or second week of the term. 

• The second post-instruction quiz was composed of questions repeated from the first 
post-instruction quiz and diagnostic questions on topics covered during the second or 
third week of the term. 

• The third post- instruction quiz was composed of questions repeated from the first post- 
instruction quiz but not used on the second post-instruction quiz, questions repeated 
from the second post-instruction quiz, and diagnostic questions on topics covered in 
class during the third or fourth week of the term. 

This process of combining and recombining questions from the pre-instruction test 
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continued until the end of the term and generally resulted in eight to ten quizzes of 
approximately seven questions each. Thus, none of the quizzes contained all the diagnostic 
questions given on the first day of class. Rather, each quiz contained a subset of diagnostic 
questions on topics discussed in class prior to giving that quiz. Consequently, the pre-instruction 
test (To) and the post-instruction tests (Ti, T 2 ) contained the same twenty-four diagnostic 
questions, but administered the questions in different ways. The pre-instruction test administered 
all the questions on the first day of class while the post- instruction tests administered the 
questions a few at a time on quizzes spread through out the term. 

The pre-instruction test and the post-instruction quizzes were scored by assigning a 
numerical value of 1 to correct answers and a numerical value of 0 to incorrect answers. By 
term’s end each student had accrued three scores for each of the twenty-four diagnostic 
questions. For a particular student answering a particular question these scores are: 

• So = 1 or 0 depending on whether the student answered the question correctly on the 
pre-instruction test. 

• Si = 1 or 0 depending on whether the student answered the question correctly on the 
first post-instruction opportunity. 

• S 2 = 1 or 0 depending on whether the student answered the question correctly on the 
second post- instruction opportunity. 

There are NxM values of So for a group of N students answering M questions. These 
values of So determine the numerator in equation (5.b) and consequently the average grade on To. 
For example, the numerator for all studio sections under study was computed by summing the 
values of So over all diagnostic questions and all studio students. Values of Si determine the 
numerator in (5.c) for the first post-instruction test and consequently the average grade on Ti. 
Similarly, values of S 2 determine the average grade on T 2 . 

The gain and loss components in equations (7. a) and (7.b) were computed in a similar 
fashion. For instance, the numerator in (7. a) for the studio gain from To to T 2 was obtained by 
summing the values of max(S 2 - So, 0) over all diagnostic questions and all midshipmen in the 
studio sections. 

Questions appearing on the calculus diagnostic test were never modified, but the optional 
choices were rearranged from time to time. Although this method has the disadvantage of using 
the same questions several times, it has the overriding advantage of eliminating any possibility 
that test questions revised for use on a later test could introduce ambiguities resulting in false 
gains and/or false losses. The technique eliminates the difficult, if not impossible, task of 
establishing equivalencies between seemingly similar questions. 

Students. Midshipmen taking the studio course had some calculus in high school and 
demonstrated strong algebraic skills on a screening test given to all plebes. The course was 
taught in an electronic classroom that limited the number of students to twenty, but enrollment 
varied between fifteen and twenty students per year. 

B. Ejfectiveness of Studio Calculus. 

This section illustrates the use of a two-stage assessment scheme to study intra-term variations in 
effectiveness for a studio course in differential calculus. A marginal analysis is presented first, 
then a cumulative analysis. 
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Marginal Effectiveness. The results of a marginal analysis of gains and losses for the 
studio course are tabulated in Table 1. When reviewing the table, keep in mind that the inter- 
period test (Ti) is used as a pre- and a post-instruction test. Consequently, the aspect ratio 
changes from one period to the next. For example, the value 1.46 of the aspect ratio for the first 
instructional period is obtained from (7.c) using the average score on Tq. The value 3.09 for the 
second period is obtained from (7.c) using the average grade on Ti. 

In addition, normali z ed gain and loss are conditioned on events that change from period 
to period. For example, in Table 1 the value 0.14 for the normali z ed loss from To to Ti means 
that 14% of diagnostic questions answered correctly on To were answered incorrectly on Ti. The 
value 0.71 for normalized gain from Ti to T 2 means that 71% of questions answered incorrectly 
on Ti were answered correctly on T 2 . 


Table 1. Marginal analysis of gains and losses for the studio course. The initial pre-instruction test is 
designated by To, the first post-instruction test by Ti, and the second post-instruction test by T 2 . Error estimates 
employ the standard deviant. The estimates for y, yL, and g are based on conventional linearization techniques; 
see (Taylor, 1982) and the discussion in (Hake, 1998, p. 73). 


Instructional 

Aspect Ratio; 

Normalized 

Renormalized 

Normalized 

Normalized 

Period 

1 

Loss; L 

Loss; yL 

Gain; G 

Change; g 

TotoTi 

1.46+0.05 

0.14+0.01 

0.20+0.01 

0.60+0.01 

0.40+0.02 

Ti to T 2 

3.09+0.13 

0.07+0.01 

0.22+0.02 

0.71+0.02 

0.49+0.03 


Data in Table 1 indicates that renorm a li z ed loss was nearly constant from one 
instructional period to the next, with yL ~ 0.21. Although nominal values of yL increased 
slightly, the increase is not large enough to be statistically significant. On the other hand, the 
difference in normalized gain is large enough to conclude that G increased from one period to the 
next. 

In summary, for the studio course renormali z ed loss remained stable while normalized 
gain increased; and (8. a) leads to the conclusion that marginal effectiveness of the course 
improved from one period to the next. Moreover, according to (9), successive differences in 
nominal values of norm a li z ed change listed in Table 1 quantify the added effectiveness. The data 
indicates the studio course was 22.5% more effective in boosting learning relative to 
performance on Ti, when the odds of a correct answer were Yi ~ 3 and the average grade was 
75%, than it was relative to performance on To, when the odds of a correct answer were Yo ~ 1-5 
and the average grade was 60%. 

Cumulative Effectiveness. Table 2 tabulates the results of a cumulative analysis of gains 
and losses for the studio course. When reviewing the table, keep in mind that for cumulative 
periods of instruction, change is measured relative to the initial diagnostic test, Tq. Consequently, 
the aspect ratio (7.c) has a fixed value. 

Similarly, norm a li z ed gain and normali z ed loss are defined relative to performance on To. 
For example, the value 0.08 for the normali z ed loss from To to T 2 means that 8% of diagnostic 
questions answered correctly on To were answered incorrectly on T 2 . The value 0.81 for the 
normali z ed gain from To to T 2 means that 81% of diagnostic questions answered incorrectly on 
To were answered correctly on T 2 . 
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Table 2. Cumulative analysis of gains and losses for the studio course. The initial pre-instruction test is 
designated by To, the first post-instruction test by Ti, and the second post-instruction test by T 2 . Error estimates 
employ the standard deviant. The estimates for y, yL, and g are based on conventional linearization techniques; 
see (Taylor, 1982) and the discussion in (Hake, 1998, p. 73). 


Instructional 

Aspect Ratio; 

Normalized 

Renormalized 

Normalized 

Normalized 

Period 

1 

Loss: L 

Loss; yL 

Gain; G 

Change; g 

TotoTi 

1.46+0.05 

0.14+0.01 

0.20+0.01 

0.60+0.01 

0.40+0.01 

To to T 2 

1.46+0.05 

0.08+0.01 

0.11+0.01 

0.81+0.01 

0.69+0.01 


Inspection of Table 2 reveals that normalized gain was larger during the period from To 
to T 2 than during the period from Toto Ti. Also renormali z ed loss was smaller from To to T 2 than 
from Toto Ti. Consequently, the studio course was more effective in promoting learning relative 
to To during the two instructional periods from To to T 2 than during the single period from To to 
T] by an amount equal to the difference in normalized change Ag = 0.69-0.40 = 0.29, see (9). 

VI. Concluding Remarks: Was Hake Correct? 

In 1998 Richard Hake published the results of a large survey of pre/post test data for introductory 
physics courses. He estimated the average normali z ed change for traditional (T) courses, those 
that made little use of interactive engagement (IE), at gj ~ 0.23. He estimated the average 
normalized change for courses that made substantial use of IE methods at giE ~ 0.48. These 
findings are noteworthy because the estimate gm ~ 0.48 for interactive courses is almost two 
standard deviations above the estimate gj ~ 0.23 for traditional courses. Hake (1998) concluded: 

Classroom use of IE strategies can increase mechanics-course effectiveness well beyond 
that obtained in traditional practice. 

Hake’s conclusion is certainly valid on the basis of (2), but is it valid on the basis of (8)? 
At present a complete answer cannot be given, since the average values of G and yL for 
traditional and interactive physics courses are not known. However, the decomposition (3) can 
be used to obtain a partial answer. 

Since (3) is an identity. Hake’s findings imply that assessment states for traditional 
courses must be distributed near the contour g ~ 0.23 in the (G, yL) plane. Eurthermore, the 
mean assessment state for traditional courses must lie on this contour. Similarly, assessment 
states for interactive courses must be distributed near the contour g ~ 0.48 and the mean IE state 
must lie on that contour. See Eigure 2. 

If the mean IE state falls along the middle portion of the contour g ~ 0.48, shown in 
Eigure 2, then (8) implies IE methods are more effective than traditional methods because on 
average they exhibit higher normalized gains and smaller renorm a li z ed losses. On the other 
hand, if the mean IE state falls along the upper or lower portions of the contour, as indicated in 
Eigure 2, then (8) implies the two methods are not comparable because one method produces 
larger gains while the other produces smaller renormali z ed losses. 

Thus, if (8), rather than (2), is used to gauge effectiveness. Hake’s data implies that 
traditional physics courses cannot, on average, be more effective than interactive courses. That 
is, the traditional approach is either less effective than the interactive approach or the two 
methods are not comparable. Although this statement is not as strong as Hake’s original 
statement, future efforts to determine the average values of G and yE for traditional and 
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interactive physics courses may make it possible to say more, even more than Hake originally 
envisioned: 

Classroom use of IE strategies can promote both the acquisition and the retention of 
mechanics-course material well beyond that obtained in traditional practice. 

See Dellwo (2009) for additional commentary on the utility of Hake’s gain. 


Hypothetical assessment states 
for traditional courses (■) fall 
near the contour g = 0.23. 


Hypothetical mean 
assessment state for 
traditional courses . 


0.2 


Lower section of the 
contour g 0 .48 . 


0.3 



I Hypothetical assessment 

states for interactive 
courses ( A) fall near the 
contour g= 0.48. 

/ 

^ Upper section 

of the contour 
g 0.48. 


IE methods are more effective 
than traditional methods if the mean 
assessment state for interactive 
courses falls on this middle portion of 
the contour g= 0.48. 


1 

0.9 


Normalized Gain 


Figure 2. The contours of g = G - yL are parallel lines in the (G, yL) plane. The hypothetical mean 
traditional state A falls on the contour g = 0.23 while the mean IE state falls on g = 0.48. 


Disclaimer 

The opinions expressed are those of the author and not the U.S. Merchant Marine Academy or 
the U.S. Department of Transportation. 
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